Natural language processing based automated system for UML diagrams generation
نویسندگان
چکیده
This paper presents a natural language processing based automated system for generating UML diagrams after analyzing the given business details in the form of the text. A new model is presented for analyzing the natural languages and extracting the relative and required information from the given storyline by the user. User writes the requirements in simple English in a few paragraphs and the designed system has conspicuous ability to analyze the given script. After compound analysis and extraction of associated information, the designed system draws various UML diagrams as activity diagrams, sequence diagrams and class diagrams. Other conventional CASE tools require a lot of extra time and efforts from the system analyst during the process of creating, arranging, labeling and finishing the UML diagrams. The designed system provides a quick and reliable way to generate UML diagrams to save the time and budget of both the user and system analyst. Introduction The looks and styles of software engineering have been completely changed in the recent times. These days step of software engineering follows the rules of Object Oriented design patterns. All phases of software engineering are deviating from the conventions and new paradigms are more popular these days. Same the case is with Software analysis process which uses Unified Modeling Language to map and model the user requirements. Analysis is the key process of building modern information system applications and base for the robust and vigorous software application’s design and development. There are various object-oriented modeling languages and tools. The Unified Modeling Language (UML) is one of the famous languages for the object-oriented analysis and design of the software applications. UML is a standard language that is used to identify, visualize, develop and document the components of software systems. Additionally, it is used for modeling and mapping the business logic and other non-software systems. Large and complex systems can easily be modeled by using UML as it is a very important part of developing objects oriented software and the software development process. Like other conventional methodologies, UML also uses graphical notations to represent and depict the design and flow of the software projects. In recent times, there is no software which provides services to draw UML diagrams more efficiently except Rational Rose, Smart Draw etc and there is no doubt that these are reasonably good software but has many disadvantages. According to the norms and conventions, the system analyst has to do a lot of work for deducing the business logic and understanding the user requirements before drawing the UML diagrams by using orthodox CASE tools. Hence, there is wastage of so much time due to the dull nature of the available CASE tools for the required scenario. In today’s world everybody needs a quick and reliable service. So it was needed that there should be some sort of intelligent software for generating UML based documentation to save time and budget of both the user and system analyst. Description of Problem Few years ago data flow diagram’s (DFD) were being used to symbolize the flow of data and represent the user’s requirements. But in current age, unified modeling language is used to model and map the user requirements, which is more comprehensive e and authentic way to of representation and it is beneficial for the later stages of software development. The problem specifically addressed in this research is primarily related to the software analysis and design phase of the software development process. The software in the current market which provides this facility is just paint like tools as Visual UML, GD Pro, Smart Draw, Rational Rose etc. All of them have dull nature. To use the extensively overloaded interface of these CASE tools is a vexing problem. The process of generating the UML diagrams through these software engineering tools is very difficult, time consuming and lengthy process to perform. Therefore, it was needed that any individual person involved obligatory in software development may get his required output with maximum accuracy in minimum time consumed. 1 Natural language processing 18th National Computer Conference 2006 © Saudi Computer Society Proposed Solution Object-oriented modeling in less time and effort is significant requirement. In order to resolve all such issues and provide some robust solutions, a helpful framework is required, which has sound ability to facilitate and assist both the users and software engineers. The functionality of the conducted research was domain specific but it can be enhanced easily in the future according to the requirements. Current designed system incorporate the capability of mapping user requirements after reading the given requirements in plain text and drawing the set of UML diagrams as Class Diagram, Activity Diagram, Sequence Diagram, Use case diagram and Component Diagram. An Integrated Development Environment would also be provided for User Interaction and efficient Input and output. Object-Oriented Analysis and Design Analysis and design of an information system relates to understand and intend the framework to accomplish the actual job. Typically, design is relates to manage and control the complexity parameter in a domain. A robust design method also helps to split big tasks into controllable breakups (Condamines, 2001). In software engineering, design methods provide various notation usually graphical ones. These notations allow to store and communicate the perpetual design decisions. Object-oriented design has overruled the typical analysis and design techniques as structured design and data-driven design (Androutsopoulos, 1995). As compared to old style design paradigms, objectoriented design models the every active entity of the problem domain using concept of objects. Objects have: • State (shape and condition) • Behaviour (What they perform) Object-oriented languages use variable to manifest the state of an object and methods or procedures to implement the behaviour of an object. For example, a ball could be an object. There are different parameters of shape as colour, size, diameter, shape, type, etc. This object can also have behaviour as throw, roll, catch, hit, etc. The major task in analysis and design phase is to identify the valid objects and specify there states and behaviours. In conventional methods, system analyst performs this tough job and then maps this information into UML using some graphical tool as Visio or Rational Rose. In the context of this research, objects are automatically identified from a problem domain. User provides the input text in English language related to the business domain. After the lexical analysis of the text, syntax analysis is performed on word level to recognize the word category (Androutsopoulos, 1995). First of all the available lexicons are categorized into nouns, pronouns, prepositions, adverbs, articles, conjunctions, etc. The syntactic analysis of the programs would have to be in a position to isolate subject, verbs, objects, adverbs, adjectives and various other complements. It is little complex and multipart procedure. “Zia is playing with the red ball.” For this example, following is the output. This is the final output of lexical assessment phase and all nouns are marked as objects and verbs are marked as methods and all adjective are marked as states of that particular object. In the above example, there is one object ‘Ali’ and ‘work’ is the concerned method of the object Ali. Natural Language Processing The understanding and multi-aspect processing of the natural languages that are also termed as “speech languages”, is actually one of the arguments of greater interest in the field artificial intelligence field (Strzalowski, 1995). The natural languages are irregular and asymmetrical. Traditionally, natural languages are based on un-formal grammars. There are the geographical, psychological and sociological factors which influence the behaviours of natural languages (Losee, 1996). There 2 Object-oriented languages use variable to manifest the state of an object and methods or procedures to implement the behaviour of an object. For example, a ball could be an obj ct. There are fferent parameters of shape as colour, size, diameter, shape, type, etc. This object can also have behaviour as throw, roll, catch, hit, etc. The major task in analysis and design phase is to identify the valid objects and specify there states and behaviours. In conventional methods, system analyst performs this tough job and then maps this information into UML using om graphical tool as Visio or Rational Rose. In the context of this research, objects are automatically identified from a problem domain. User provides the input text in English language related to the business domai . After the lexical analysis of the text, s ntax analysis is erfor ed on word level to recognize the word category (Androutsopoulos, 1995). First of all the available lexicons are categorized into nouns, pronouns, prepositions, adverbs, articles, conjunctions, etc. The syntactic analysis of the programs would have to be in a position to isolate subject, verbs, objects, adverbs, adjectives and various other ents. It is little complex and multipart procedure. "Zia is playing wit ba l." For this example, following is the output. Lexicons Phase-I Phase –II Zia Noun Object is Helping-Verb ------playing Verb Method with Preposition ------the Article ------red Noun Attribute ball Noun Object This is the final output of lexical assessment phase and all nouns are marked as objects and verbs are marked as methods and all adjective are marked as states of that particular object. In the above example, there is one object ‘Ali’ and ‘work’ is the concerned method of the object Ali. Natural Language Processing The understanding and multi-aspect processing of the natural languages that are also termed as "speech languages", is actually one of the arguments of greater interest in the field artificial intelligence field (Strzalowski, 1995). The natural languages are irregular and asymmetrical. Traditionally, natural languages are based on un-formal grammars. There are the geographical, psychological and sociological factors which influence the behaviours of natural languages (Losee, 1996). There are undefined set of words and they also change and vary area to area and time to time. Due to these variations and inconsistencies, the natural languages have different flavours as English language has more than half dozen renowned flavours all over the world. These flavours have different accents, set of vocabularies and phonological aspects. These ominous and menacing discrepancies and inconsistencies in natural languages make it a difficult task to process them as compared to the formal languages (Krovetz, 1992). In the process of analyzing and understanding the natural languages, various problems are usually faced by the researchers. The problems connected to the greater complexity of the natural language are verb’s conjugation, inflexion, lexical amplitude, problem of are undefined set of words and they also change and vary area to area and time to time. Due to these variations and inconsistencies, the natural languages have different flavours as English language has more than half dozen renowned flavours all over the world. These flavours have different accents, set of vocabularies and phonological aspects. These ominous and menacing discrepancies and inconsistencies in natural languages make it a difficult task to process them as compared to the formal languages (Krovetz, 1992). In the process of analyzing and understanding the natural languages, various problems are usually faced by the researchers. The problems connected to the greater complexity of the natural language are verb’s conjugation, inflexion, lexical amplitude, problem of ambiguity, etc. From this set of problems the problem which ever causes more difficulties is problem of ambiguity. Ambiguity could be easily solved at the syntax and semantic level by using a sound and robust rule-based system. Used Methodology Conventional natural language processing based systems use rule based systems. Agents are another way to develop speech language based systems (Krovetz, 1992). In the research, a rule-based algorithm has been designed and used which has robust ability to read, understand and extract the desired information. First of all, basic elements of the language grammar are extracted (Drouin, 2004) as verbs, nouns, adjectives, etc then on the basis of this extracted information further processing is performed. In linguistic terms, verbs often specify actions, and noun phrases the objects that participate in the action (Zelle, 1993). Each noun phrase’s then role specifies how the object participates in the action. As in the following example Ali is agent: “Ali is writing a letter with a pen.” A procedure that understands such a sentence must discover the agent because he performs the action of writing, that the letter as the thematic object because it is the object that is written, and that the pen is an instrument because it is the tool with which hitting is done (Gómez-Pérez, 2005). Thus, complete sentence analysis finds information about the agent, co-agent, thematic object, beneficiary, etc. The identification of such information specifically helps to understand the meanings of the input sentence as given below. Agent: The agent causes the action to occur as in “Ahmed hit the ball,” Ahmed is agent who performs the task. But in this example a passive sentence, the agent also may appear as “The ball was hit by Ahmed.’’ Co-agent: If agent is working with any other partner that is called co-agent. Both of them carry out the action together as “Ahmed played tennis with Ali.” Beneficiary: The beneficiary is the person for whom an action has bee performed: “Ahmed brought the balls for Ali.” In this sentence Ali is beneficiary. Thematic object: The thematic object is the object the sentence is really all about— typically the object, undergoing a change. Often the thematic object is the same as the syntactic direct object, as “Ahmed hit the ball.” Here the ball is thematic object. Conveyance: The conveyance is something in which or on which agent travels: ‘Ahmed goes by train.” Trajectory: Motion from source to destination takes place over a trajectory. ID contrast to the other role possibilities, several prepositions can serve to introduce trajectory noun phrases: “Ahmed and Ali went to London from Islamabad” Location: The location is where an action occurs. Several prepositions are manifesting the location usually a noun phrase as “Ali studied in the library, at a desk, by the wall, a picture, near the door.” Time: Time specifies when an action occurs. Prepositions such at, before and after introduce noun to depict time as “Ahmed and Ali left before Evening.” Duration: Duration specifies how long an action takes. Preposition such as since and for indicate duration. “Ahmed and Ali walked for an hour.” 3 Natural language processing Architecture of Designed System The designed UMLG system has ability to draw UML diagrams after reading the text scenario provided by the user. This system draws diagrams in five modules: Text input acquisition, Syntactic Analysis, Text understanding, Knowledge extraction, and finally Generation of UML diagrams as shown in following figure 1. i. Text input acquisition This module helps to acquire input text scenario. User provides the business scenario in from of paragraphs of the text. This module reads the input text in the form characters and generates the words or lexicons (Tang, 2001) by concatenating the input characters. This module is the implementation of the lexical phase. Language specified lexicons or tokens or symbols are generated in this module. ii. Syntactic Analysis This is the second module of the deigned framework and it reads the input from module one in the form of words. These words are categorized into various classes as verbs, helping verbs, nouns, pronouns, adjectives, prepositions, conjunctions, (Fagan, 1989) etc on the basis of the defined rules for categorization. A set of rules are defined here on the basis of the standard English grammatical rules also called parts of speech conventions. iii. Text Understanding This module reads the input from module 1 in the form of words. The meanings of the given text are inferred on this module using semantic rules (Malaisé, 2005). These words are categorized into various classes as verbs, helping verbs, nouns, pronouns, adjectives, prepositions, conjunctions, etc. iv. Knowledge extraction Required data attributes are extracted in this module (Rijsbergen, 1977) according to the given guide lines. This module, extracts different objects and classes and their respective attributes on the basses of the input provided by the preceding module. Nouns are symbolized as classes and objects and their associated attributes are termed as attributes. v. UML diagram generation This is the last module, which finally uses UML symbols and draws various UML diagrams by combining available symbols according to the information extracted of the previous module. As separate 4 Time: Time specifies when an action occurs. Prepositions such at, before and after introduce noun to depict time as "Ahmed and Ali left before Evening." Duration: Duration specifies how long an action takes. Preposition such as since and for indicate duration. "Ahmed and Ali walked for an hour.” Architecture of Designed System The designed UMLG system has ability to draw UML diagrams after reading the text scenario provided by the user. This system draws diagrams in five modules: Text input acquisition, Syntactic Analysis, T xt understanding, Knowledge extraction, and finally Generation of UML diagrams as shown in following figure 1. Figure 1. Architecture of the Natural Language Processing based Automated System for UML Diagrams Generation i. Text input acquisition This module helps to acquire input text scenario. User provides the business scenario in from of paragraph of the text. This module reads the input text in t e form characters and generates the words or lexicons (Tang, 2001) by concatenating the input characters. This module is the implementation of the lexical phase. Language specified lexicons or tokens or symbols are generated in this module. ii. Syntactic Analysis This is the second module of the deigned framework and it reads the input from module one in the form of words. These words are categorized into various classes as verbs, helping verbs, nouns, pronouns, adjectives, prepositions, conjunctions, (Fagan, 1989) etc on the bas of the defined rules or categorization. A set of rule are defi ed here on the basis of the standard English grammatical rules also called parts of speech conventions. Diagram Generation Knowledge Extraction
منابع مشابه
Natural Language Processing for Scenario based UML Diagrams Generation
This paper presents a natural language processing based automated system for generating UML diagrams after analyzing the given business scenario. A new model is presented for analyzing the natural languages and extracting the relative and required information from the given storyline by the user. User writes the requirements in simple English in a few paragraphs and the designed system has cons...
متن کاملNatural Language Processing based Automatic Multilingual Code Generation
Unified modeling language is being used as a premier tool for modeling the user requirements. These CASE tools provide an easy way to get efficient solutions. This paper presents a natural language processing based automated system for generating code in multilanguages after modeling the user requirements based on UML. UML diagrams are first generated by analyzing the given business scenario pr...
متن کاملObject Oriented Software Modeling Using NLP Based Knowledge Extraction
This paper presents a natural language processing based automated system for NL text to OO modeling the user requirements and generating code in multi-languages. A new rule-based model is presented for analyzing the natural languages (NL) and extracting the relative and required information from the given software requirement notes by the user. User writes the requirements in simple English in ...
متن کاملSemantic annotation of requirements for automatic UML class diagram generation
The increasing complexity of software engineering requires effective methods and tools to support requirements analysts’ activities. While much of a company’s knowledge can be found in text repositories, current content management systems have limited capabilities for structuring and interpreting documents. In this context, we propose a tool for transforming text documents describing users’ req...
متن کاملA Novel CASE Tool based on Pre-Conceptual Schemas for Automatically Obtaining UML Diagrams Una Novedosa Herramienta CASE basada en Esquemas Preconceptuales para la Obtención Automática de Diagramas UML
Assistance is provided, in software development process, to Analysts in drawing UML diagrams and others by means of CASE tools. However, the task of the Stakeholder discourse understanding, a previous process in diagram drawing, is not supported by traditional CASE tools. In order to complete this task, Natural Language Processing has proposed a new kind of CASE tools, including both natural l...
متن کاملBehavior based Automated Test Case Generation for Object Oriented Systems
An innovative approach of generating test cases from the combination of UML design diagrams has been discussed in this paper. Present work used an approach where petal files of class diagram, sequence diagram and state chart diagram has been used to generate test cases. The test cases thus generated are suitable for static and dynamic testing of system. General Terms Test case generation from U...
متن کامل